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Abstract 

We  describe  a  fast  algorithm  to  speed  up  rendering  of  scenes  for  walkthroughs  in  urban  environmen  ts .  Our  oc¬ 
clusion  culling  algorithm  takes  advantage  of  temporal  coherence  in  image  space.  As  such,  occlusion  calculation 
is  performed  online  only  when  needed .  This  enables  us  to  employ  intelligent  occluder-selection  and  culling  al¬ 
gorithms.  We  do  not  preprocess  visibility  information  or  pre-select  occluders.  Therefore,  we  can  update  scenes 
dynamically  at  a  little  cost.  The  algorithm  features  a  tradeoff  between  accuracy  and  efficiency.  While  it  approxi¬ 
mates  visibility  testing,  our  experiments  show  that  errors  occur  rarely. 


1.  Introduction 

In  urban  walkthroughs,  a  user  virtually  navigates  through  a 
3D  city  model  as  a  pedestrian  or  as  an  auto  driver.  Optimally, 
we  would  like  interactive  rendering  of  30  to  60  frames  a 
second.  Unfortunately,  data  gathering  techniques  have  out¬ 
stripped  advances  in  rendering  hardware,  making  interactive 
rendering  of  massive  data  sets  impossible  without  reducing 
the  number  of  primitives  rendered  at  each  frame.  Occlusion 
culling  is  one  popular  technique  for  this  reduction. 

Occlusion  culling  is  especially  suitable  for  urban  environ¬ 
ments  since  the  scenes  are  usually  densely  occluded.  How¬ 
ever,  the  characteristics  of  urban  environments  also  raise 
several  challenging  issues  for  occlusion  culling  algorithms. 
First,  large  amounts  of  objects  in  urban  environments  are 
hidden  by  the  combination  of  several,  not  necessarily  con¬ 
nected  occluders,  therefore,  the  effect  of  multiple  occluders 
—  occluders  fusion  —  has  to  be  considered  for  a  city  model. 
Second,  most  buildings  in  city  models  are  of  similar  sizes,  so 
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Figure  1:  Visualization  of  our  algorithm  on  a  Manhattan 
city  model  with  27,400  polygons,  (a)  is  a  bird-eye  view,  (b) 
is  a  view  along  a  street,  (c)  shows  light  grey  city  map  over¬ 
lay  ed  with  black  view  frustum,  dark  grey  culled  objects,  and 
black  occluders.  In  this  view,  our  algorithm  culls  88%  of  the 
polygons. 


a  few  occluders  seldom  suffice  and  occluder-selection  meth¬ 
ods  only  relying  on  heuristics  such  as  size  and  distance  may 
fail  to  capture  significant  occlusion  in  a  city  model  (see  Fig- 
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Figure  2:  None  of  the  buildings  inside  region  B  has  large 
size  or  is  close  to  the  viewpoint ,  but  they  together  form  good 
occlusion. 


ure  2).  Besides,  urban  models  are  always  deep;  namely  at 
most  viewpoints,  a  significant  number  of  buildings  are  far 
away  from  the  viewpoint.  As  the  viewpoint  moves  contin¬ 
uously,  faraway  buildings  that  are  occluded  remain  so  for 
a  “long”  period  of  time.  Thus  strong  temporal  coherence 
—  the  scene  does  not  change  much  within  two  consecutive 
frames  —  exists  in  urban  walkthrough  applications. 

In  this  paper,  we  present  a  simple  and  fast  occlusion- 
culling  algorithm  for  urban  environments.  The  algorithm  se¬ 
lects  the  occluders  based  on  a  novel  measure  of  importance. 
The  key  features  of  our  algorithm  are  its  selection  of  effec¬ 
tive  occluder  and  its  exploitation  of  temporal  coherence  by 
means  of  occluder  set  shrinkage.  The  algorithm  performs  oc¬ 
clusion  culling  in  only  a  small  subset  of  all  the  frames  due  to 
the  utilization  of  temporal  coherence.  The  shrinking  allows 
tradeoff  between  accuracy  and  efficiency.  For  the  purposes 
of  this  paper,  we  assume  the  input  model  to  be  2.5D,  al¬ 
though  our  algorithm  can  be  extended  to  the  3D  case.  Given 
a  hierarchical  representation  of  the  scene,  the  proposed  al¬ 
gorithm  does  not  require  any  pre-processing  or  prior  knowl¬ 
edge  about  the  walkthrough  path.  It  computes  and  maintains 
the  occluder  set  and  the  necessary  visibility  information  in 
an  on-line  fashion,  and  can  update  the  scene  dynamically. 
The  algorithm  is  simple  and  can  be  integrated  with  most  ex¬ 
isting  occlusion-culling  algorithms  to  improve  their  culling 
rate  at  little  extra  cost. 

The  resulting  algorithm  has  been  implemented  on  a  SGI 
Octane  Mips  R 10000  platform,  and  tested  on  both  static  and 
simulated  dynamic  environments.  Considerable  speedup  in 
both  culling  rate  and  overall  frame  rate  has  been  achieved, 
as  demonstrated  by  the  experimental  results. 

The  remainder  of  the  paper  is  organized  as  follows.  Sec¬ 
tion  2  gives  a  review  of  previous  work.  Section  3  describes 
the  outline  of  the  algorithm,  while  Sections  4-6  elaborate 
on  the  key  stages  of  our  algorithm.  Section  7  presents  the 
results  and  performance  analysis,  and  finally,  Section  8  con¬ 
cludes  by  discussing  future  work  and  open  problems. 

2.  Previous  Work 

Cohen-Or  et  al.  4  survey  recent  results  on  occlusion  culling 
and  visibility.  In  what  follows,  we  distinguish  between  two 


classes  of  occlusion-culling  algorithms:  preprocessing  ap¬ 
proaches  and  online  approaches. 

Preprocessing  methods  typically  partition  the  view  space 
into  cells,  then  pre-compute  and  store  visibility  information 
for  each  region  6* I2t  15’  I7>  18« 19*20.  Occluders  fusion  is  inher¬ 
ently  difficult  to  be  computed  for  a  view  cell,  and  some  ap¬ 
proaches  exploit  the  idea  of  “virtual”  occluders 2>  9- 16.  For  the 
special  case  of  urban  walkthroughs,  Cohen-Or  et  al.5  pro¬ 
vide  a  modeling  method  for  densely  occluded  city  data  sets 
and  pre-compute  hidden  buildings  for  each  view  cell.  Wonka 
et  al. 22  apply  a  shrinking  idea  in  the  object  space  and  cull 
maps  in  the  image  space  to  perform  visibility  preprocessing. 
The  above  approaches  are  fast  during  real-time  applications, 
but  are  considerably  costly  with  respect  to  time  and  mem¬ 
ory  during  the  preprocessing,  and  may  not  be  generalized  to 
dynamic  scenes  well. 

Unlike  the  above  preprocessing  approaches,  most  on-line 
methods  perform  little  preprocessing  and  apply  occlusion 
tests  and  culling  with  respect  to  current  viewpoint  at  each 
frame.  Many  of  them 3-7- 8- 11  preselect  a  few  large  occluders 
and  do  on-line  computation  in  object  space.  Zhang  et  al.  23 
use  the  image-space  idea  and  store  the  “opacity”  information 
into  a  hierarchical  occlusion  map,  which  are  generated  with 
the  help  of  texture-mapping  graphics  hardware,  though  they 
still  need  to  preselect  a  set  of  occluders  (see  also 10).  Wonka 
et  al. 21  apply  a  similar  cull  map  idea  using  z-buffer  to  urban 
environments  and  allow  dynamic  occluders  selection.  One 
different  approach  proposed  by  Klosowski  et  al.  13t  14  is  to 
render  on  a  budget  (on  demand)  using  a  novel  prioriti zed- 
layered  projection  technique.  Our  algorithm  takes  a  similar 
idea  for  occluder  selection. 

In  some  ways,  our  algorithm  is  similar  to  the  approach  of 
Wonka  et  al . 22  —  we  also  use  the  shrinking  idea  and  focus 
on  urban  walkthroughs.  But  as  illustrated  later  in  the  paper, 
we  provide  a  much  faster  on-line  shrinking  process  in  the 
image  space  and  solve  the  problem  caused  by  shrinking  ob¬ 
jects  separately.  As  in  some  earlier  approaches  9- i0-21’23,  our 
algorithm  utilizes  the  graphics  hardware  to  fuse  an  occlusion 
mask  to  do  culling  in  the  image  space.  But  we  take  further 
advantage  of  the  strong  temporal  coherence  existed  in  urban 
environments  and  perform  an  image-space  culling  only  once 
per  several  frames. 

3.  Algorithm  Outline 

We  propose  an  on-line  occlusion-culling  algorithm  for  walk¬ 
through  applications  in  dynamic  urban  environments.  Al¬ 
though  our  occlusion  culling  algorithm  might  generate 
wrong  pixels  in  the  resulting  image,  thus  not  being  con¬ 
servative,  we  can  obtain  a  tradeoff  between  accuracy  and 
culling  rate.  To  the  best  of  our  knowledge,  our  algorithm  is 
the  first  on-line  algorithm  to  utilize  the  temporal  coherence 
in  the  image  space.  The  algorithm  computes  a  hierarchical 
occlusion  mask  with  the  help  of  the  graphics  hardware,  and 
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Figure  3:  Occlusion  test:  (a)  shadow  frustum  in  object 
space;  (b)  mask  in  image  space. 


marks  a  subset  of  objects  that  are  occluded  by  the  mask. 
It  also  computes  a  conservative  estimate  of  the  time,  called 
time  stamp ,  until  when  all  of  these  objects  would  remain  oc¬ 
cluded.  Therefore  all  objects  whose  time  stamps  are  greater 
than  the  current  time  are  currently  occluded.  Since  we  use 
“time”  instead  of  one  bit  to  mark  the  occluded  objects,  we  do 
not  have  to  unmark  them  when  they  are  no  longer  occluded 
—  we  simply  compare  their  time  stamps  with  the  current 
time.  Our  algorithm  also  supports  dynamic  insertions  and 
deletions  of  new  objects  during  the  walkthrough. 

Most  of  the  early  culling  algorithms  perform  occlusion 
culling  in  the  object  space,  where  all  objects  inside  the 
shadow  frustum  formed  by  the  occluder  and  the  viewpoint 
are  culled.  This  approach  becomes  impractical  when  there 
are  relatively  large  number  of  occluders,  and  occluders  fu¬ 
sion  is  required.  We  extend  the  image-space  approach  pro¬ 
posed  by  Green  10  to  do  visibility  tests  using  occlusion 
masks. 

Occlusion  masks:  For  a  single  viewpoint  v,  an  occlusion 
mask  is  a  regular  2D  grid  on  the  image  plane.  Each  cell 
stores  the  maximum  depth  value  of  all  the  objects  visible 
inside  this  cell,  where  depth  refers  to  the  distance  of  the  ob¬ 
ject  from  the  current  viewpoint.  An  object  O  is  occluded  by 
a  mask  M  if  for  any  point  p  on  0,  the  depth  of  p  is  greater 
than  the  value  stored  in  the  cell  on  the  mask  intersected  by 
the  segment  vp.  See  Figure  3  (b).  In  our  implementation,  an 
occlusion  test  is  performed  by  an  overlap  test  followed  by  a 
depth  test. 

Critical  frames:  As  we  mentioned  earlier,  at  some  of  the 
frames  our  algorithm  recomputes  the  visibility  information 
and  masks  a  subset  of  the  objects  that  are  not  visible  for 
several  consecutive  frames.  We  call  these  frames  critical  In 
current  implementation,  the  critical  frame  happens  when  (i) 
real  time  reaches  the  value  of  the  time  stamp,  (ii)  a  dramatic 
change  happens  in  the  view-direction,  or  (iii)  an  occluder  is 
deleted  from  the  scene. 

We  preprocess  the  set  of  input  polygons  and  store  them 
into  a  kd-tree  T,  which  induces  a  hierarchical  subdivision  of 
the  object  space.  The  algorithm  then  does  the  following  at 
each  frame: 


I.  If  it  is  a  critical  frame,  it  performs  the  occlusion  marking 
operation  as  follows: 

(1.1)  Choose  a  set  of  occluders. 

(1.2)  Apply  the  image-space  shrinking  algorithm. 

(1.3)  Generate  a  hierarchical  occlusion  mask. 

(1.4)  Compute  a  time  stamp  and  mark  all  objects  lying  be¬ 
hind  the  mask  with  this  value  of  the  time  stamp. 

II.  For  any  frame: 

(II.  1)  If  the  environment  has  changed,  then  perform  the  dy¬ 
namic  update  algorithm. 

(II.2)  Pass  all  objects  that  are  inside  the  view  frustum  and 
whose  time  stamps  are  less  than  the  current  time  to  the 
graphics  hardware. 

4.  Preprocessing  of  the  Input 

Let  S  be  the  set  of  input  polygons.  Since  a  city  model  is 
2.5-dimensional,  namely  all  objects  are  placed  on  top  of  a 
ground  plane,  we  use  a  invariant  of  2D  kd-tree  to  store  the 
xy-projections  of  the  polygons  in  S ,  along  with  the  height 
information.  The  data  structure  is  constructed  as  follows. 

For  each  polygon  A  €5,  let  B&  be  the  smallest  orthog¬ 
onal  box  containing  A,  and  let  pa  be  the  xy-projection  of 
the  bottom-left  comer  of  B&.  We  construct  a  2D  kd-tree  T 
on  the  point  set  P  =  {pa  |  A  €  S}.  Each  node  x  of  T  is 
associated  with  a  subset  Px  CP  and  a  rectangle  R%,  which 
is  the  smallest  enclosing  rectangle  of  Px.  We  also  associate 
a  3D  box  BXy  which  is  the  smallest  orthogonal  box  contain¬ 
ing  all  polygons  A  such  that  pa  €  Fr.  For  the  root  u  of  T, 
Pu-P  and  Ru  =  IR 2.  If  |Fx|  is  less  than  a  certain  parameter 
p,  then  x  is  a  leaf.  At  each  leaf  x,  we  store  the  set  of  polygons 
{A  |  pa  €  Fx}.  Otherwise,  we  choose  an  axis-parallel  line 
lx,  called  splitter ,  and  partition  Rz  into  two  rectangles,  each 
of  which  is  associated  with  a  child  of  x.  Points  of  Px  lying  in 
each  of  the  rectangles  are  associated  with  the  corresponding 
child  of  x. 

There  are  many  possibilities  of  choosing  a  splitter  lx.  We 
could  simply  bisect  the  points  in  Px  and  alternate  between 
horizontal  and  vertical  splitters,  or  we  could  use  a  more  so¬ 
phisticated  method. 

Note  that  the  rectangles  associated  with  the  leaves  of  T 
are  disjoint,  but  the  jcy-projections  of  the  bounding  boxes  as¬ 
sociated  with  the  leaves  could  intersect  because  they  bound 
the  polygons.  Each  polygon  is  stored  at  only  one  leaf.  The 
interior  nodes  do  not  store  the  polygons.  The  total  size  of 
the  data  structure  is  0(n),  where  n  is  the  number  of  input 
polygons. 

5.  Occlusion  Marking  Operation 

At  each  critical  frame,  the  algorithm  first  performs  an 
occlusion-marking  operation.  The  goals  of  this  step  are  to 
take  advantage  of  the  temporal  coherence,  to  choose  a  set  of 
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Figure  4:  The  relationship  between  shrinkage  in  image- 
space  and  in  object-space. 


(0  ie\ 


Figure  5:  Shrinking  the  union  of  projections  may  cause  a 
leak  as  shown  in  (b).  But  this  leak  will  not  cause  error  on 
the  resulting  image  if  other  objects  (as  C  in  (d))  or  other 
occluders  (as  D  in  (e))  still  cover  it. 


occluders,  and  to  generate  the  hierarchical  occlusion  masks 
such  that  the  marked  objects  would  remain  occluded  for  sev¬ 
eral  subsequent  frames. 


5.1.  Temporal  coherence 

Let  v  be  the  current  viewpoint,  and  let  B§(v)  be  a  ball  of 
radius  5  centered  at  v.  For  an  object  a,  the  shrinking  of  O 
by  8  is  the  Minkowski  difference  oQB§  =  {x  \  Bs(x)  C  a}. 
The  following  lemma  is  straightforward. 


5.2.  Image-space  shrinking  and  fusing 

Shrinking  each  occluder  in  object  space  separately  has  two 
major  disadvantages:  (i)  it  is  expensive  and  complicated 22 ; 
and  (ii)  unnecessary  leaks  may  appear  if  two  connected  poly¬ 
gons  are  shrunk  independently.  The  problems  are  further  ex¬ 
acerbated  if  the  model  is  finely  tessellated  or  if  the  input  is 
given  as  a  set  of  polygons  without  any  connectivity  informa¬ 
tion  between  them.  Our  algorithm  instead  shrinks  the  union 
of  the  projections  of  occluders  on  the  image  plane.  Thus  the 
algorithm  puts  no  restrictions  on  the  input  data  and  preserves 
the  connectivity  between  polygons  during  shrinking. 


Conservative  Visibility  Lemma:  Let  £  be  an  occlud¬ 
ers  set,  and  let  v  be  the  viewpoint,  l!  =  {<7©  B§  |  a  £  £}.  If 
a  point  p  is  occluded  from  v  by  T! ,  then  p  is  occluded  by  £ 
from  any  v'  £  #5(v). 

The  lemma  suggests  that  if  we  use  £'  instead  of  £  to  do 
occlusion  culling  at  the  viewpoint  v,  the  culling  would  re¬ 
main  valid  as  long  as  the  viewpoint  remains  inside  £5(v). 
Shrink  occluder  P  by  8,  and  let  Pf  be  the  new  polygon.  If  a 
polygon  Q  is  occluded  by  Pf  and  the  viewpoint  is  moving 
with  a  maximum  velocity  V  at  the  moment,  then  Q  will  re¬ 
main  occluded  by  P  for  the  next  8/V  time  units.  In  such  a 
case,  one  can  set  the  time  stamp  to  tc  +  8/V,  where  tc  is  the 
current  time. 


In  the  image  space,  how  much  the  projection  of  a  polygon 
P  shrinks  as  we  shrink  P  by  8  in  the  object  space  depends  on 
the  distance  between  P  and  the  viewpoint.  More  precisely, 
let  the  image  resolution  b  esxs,  the  minimum  distance  be¬ 
tween  P  and  v  be  D,  the  angle  formed  by  the  view  frustum  be 
0,  and  let  A  be  the  amount  by  which  we  shrink  the  projection 
of  P  (see  Figure  4).  Then 


8  >  2A- 


D 

jtan(9/2)  ’ 


(1) 


The  analysis  indicates  that  by  shrinking  the  projection  of 
each  occluder  P  in  the  image  space  by  A,  the  resulted  image, 
i.e.,  fusing  the  shrunk  projection  into  one  occlusion  mask,  is 
conservative  in  visibility  for  all  viewpoints  inside  B§(v). 


Shrinking  the  union  may  cause  error  sometimes,  such  as 
illustrated  in  Figure  5.  Object  A  is  in  front  of  Object  B,  but 
their  projections  overlay  each  other.  The  result  of  shrink¬ 
ing  them  separately  is  depicted  in  Figure  5  (c),  which  con¬ 
tains  a  small  leak  where  a  third  object  could  be  visible  if  no 
other  objects  were  covering  this  slit  and  thus  hiding  the  ob¬ 
ject  from  the  viewpoint.  However,  if  one  shrinks  the  union 
of  the  two  objects  together,  as  depicted  in  Figure  5  (b),  the 
leak  does  not  appear  and  the  resulting  visibility  is  thus  ap¬ 
proximate.  The  maximum  size  of  each  error  on  the  resulting 
image  is  bounded  by  the  shrinkage  A. 

However,  if  we  consider  the  visual  error,  i.e.,  the  mis- 
drawn  pixels  on  the  resulting  image,  the  slit  as  illustrated  in 
Figure  5  (c)  would  produce  little  or  no  error  since:  (i)  as  will 
explain  later,  the  occluders  chosen  by  the  algorithm  are  rel¬ 
atively  far  from  the  viewpoint,  there  is  high  probability  that 
closer  objects  would  occlude  the  slit  (see  Figure  5  (d));  (ii) 
we  choose  a  “thick”  layer  of  occluders  (we  will  address  this 
issue  in  Section  5.3),  thus  other  occluders  may  cover  this  slit 
(see  Figure  5  (e));  (iii)  A  is  small  and  thus  the  visible  portion 
through  the  slit  is  tiny;  and  (iv)  the  user  is  generally  walking 
along  the  streets,  and  buildings  along  the  streets  and  objects 
close  to  him  are  the  focus  of  the  view,  while  the  errors  occur 
in  places  of  less  visual  importance.  So,  although  larger  A  po¬ 
tentially  allow  larger  slits,  as  we  will  see  in  Section  7,  errors 
rarely  occur. 

Though  larger  A  means  more  possible  errors  and  less  cov¬ 
erage  on  the  occlusion  mask,  it  can  increase  the  frame  rate 
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(the  speed),  until  it  reaches  the  point  that  it  damages  the 
culling  rate  more  severely.  So  A  is  an  important  parameter 
in  the  tradeoff  between  accuracy  and  speed. 

5.3.  Occluder  selection 

We  use  the  following  criteria  to  design  the  occluder  selection 
algorithm:  (i)  the  selection  process  is  fast,  (ii)  R ,  the  number 
of  occluders  in  O ,  is  not  too  big,  (iii)  the  mask  generated  is 
well-covered,  and  (iv)  the  distance  between  an  occluder  and 
the  viewpoint  v  is  at  least  some  parameter  D  in  order  to  take 
advantage  to  temporal  coherence  (refer  to  Equation  (1)). 

Since  the  occlusion  mask  is  generated  in  the  image  space 
with  the  help  of  graphics  hardware,  our  algorithm  can  afford 
to  choose  a  relatively  large  set  of  occluders  than  allowed  by 
the  object-space  approaches.  The  influence  of  R  will  be  ad¬ 
dressed  in  Section  7.1.  Given  a  fixed  number  /?,  the  goal  of 
the  occluder-selection  algorithm  is  to  find  R  most  “impor¬ 
tant”  objects  among  all  the  objects  that  lie  inside  the  view- 
frustum  and  whose  minimum  distance  from  v  is  at  least  D. 
The  “importance”  of  an  object  or  a  node  in  the  kd-tree  refers 
to  their  contribution  to  the  occlusion  mask,  namely,  how  well 
they  occlude  objects  that  are  farther  away  from  the  view¬ 
point.  For  a  node  £  of  T,  let  <)>(£)  represent  the  inverse  of 
“importance”  value  of  a  kd-tree  node  thus  a  smaller  <J)(y 
means  node  %  is  more  important.  The  occluder-selection  al¬ 
gorithm  is  depicted  in  Figure  6.  The  algorithm  works  by 
maintaining  a  priority  queue  of  the  kd-tree  nodes  to  be  vis¬ 
ited,  based  on  the  value  of 


ALGORITHM  Occluder-selection(7\  v,  R) 
Input:  kd-tree  T,  viewpoint  v,  #occluders  R 
Output:  R  polygons  chosen  as  occluders. 

Insert  root(T)  into  Q\ 

while  (( Q  ^  0)  and  ( count  <  R)) 

%  -  get-min(0; 

if  (size(^)  is  small)  and  (mindist(£,  v)  >  D) 
Output  polygons  in  £  as  occluders; 
update  count ; 
else 

Insert  two  children  of  £  into  Q\ 

end  if 
end  while 

end  Occluder-selection 


Figure  6:  Occluder  selection  algorithm;  get -min(Q)  returns 
the  node  in  Q  with  the  minimum  <)>  value,  and  mindist(^v) 
returns  the  minimum  distance  between  £  and  v. 


We  can  simply  choose  <|>(£)  to  be  the  minimum  distance 
between  %  and  v.  This  is  equivalent  to  choosing  all  objects 
in  an  annulus  with  an  inner  radius  D;  the  size  of  the  annulus 
depends  on  the  value  of  R.  See  Figure  7  (a)  for  illustration. 
This  works  well  for  dense  models  with  almost  uniform  dis- 


Figure  7:  F  is  the  view  frustum.  D  is  the  minimum  distance. 
In  (a),  objects  in  the  dark  region  will  be  chosen  as  occlud¬ 
ers  based  on  distances.  In  (b),  buildings  in  region  A  will  be 
chosen  as  occluders  based  on  distances.  Buildings  in  B  are 
missed,  although  they  contribute  a  lot  to  the  occlusion  mask 
too. 


tribution  of  buildings. 

However,  it  ignores  the  objects  that  are  far  away  but  nev¬ 
ertheless  contribute  significantly  to  the  occlusion  mask.  Fig¬ 
ure  7  (b)  illustrates  one  such  scenario,  where  the  above  dis¬ 
tance  criteria  only  choose  the  building  in  region  A  even 
though  the  buildings  in  region  B  would  be  good  occluders. 

In  the  following,  we  consider  only  the  objects  that  are  in¬ 
side  the  view  frustum  and  are  at  least  D  distance  away  from 
v.  Intuitively,  the  definition  of  <(>(£),  for  a  node  £  in  the  kd- 
tree,  should  depend  on  how  many  polygons  closer  than  £ 
occlude  the  objects  stored  at  In  other  words,  if  %  covers  a 
spot  s  on  the  image  plane,  and  several  closer  polygons  (par¬ 
tially)  have  already  covered  s ,  then  £  is  not  a  good  candidate 
to  be  an  occluder,  as  it  potential  contribution  to  the  occlu¬ 
sion  of  5  is  small  (i.e.,  see  the  buildings  with  light  color  in 
Figure  7  (b)).  We  therefore  use  the  following  approach. 

We  divide  the  image  plane  into  cells,  and  assign  a  “cov¬ 
erage”  value  C{c)  for  each  cell  c.  At  any  moment  during  the 
occluder-selection  algorithm,  C{c)  is  defined  as  the  number 
of  polygons  encountered  so  far  whose  projections  overlap  c. 
For  a  kd-tree  node  £,  we  define 

<(>(^)  =  min  {C(c,)}  *mindist(£,v), 
i~\..k 

where  ciy  for  i  =  1,. ..  are  sampled  cells  that  the  pro¬ 
jection  of  £  covers,  and  mindist(£,  v)  returns  the  minimum 
distance  between  node  £  and  viewpoint  v.  In  the  current  im¬ 
plementation,  k  is  set  to  be  3.  More  precisely,  for  a  node  £ 
with  a  bounding  box  B^>  we  choose  3  points,  namely,  the 
center  point(pi)  of  B^  and  the  two  endpoints  (p2  and  pf) 
of  one  diagonal  of  B^>  and  c;  is  the  cell  that  ray  vpi  passes 
through.  Whenever  a  node  £,  is  added  to  the  occluders  set, 
each  cell  it  covers  updates  its  coverage  to 

C(a)new  -  C(ci)0id  +  (#  polys  in  £)/(#cells  £  covers). 

Figure  8  illustrates  the  idea  in  2D. 

We  have  implemented  both  methods  described  above  for 
computing  <()(£).  Figure  9  shows  the  different  outputs  of  the 
occluder-selection  algorithm  under  the  same  viewpoint  and 
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Figure  8:  The  3  light  cells  are  chosen  to  decide  <])(£)  for 
node  \  while  the  dark  cells  will  be  updated  if  polygons  in  £ 
are  outputed  as  occluders. 


Figure  9:  In  both  city  maps,  dark  polygons  are  occluders 
selected  by  the  algorithm.  The  value  of  <)>  is  defined  by  dis¬ 
tance  only  in  (a),  and  by  the  more  complicated  method  in 

(by 


view  direction,  using  those  two  different  versions  of  <J).  The 
occluder  selection  based  on  the  combined  criteria  is  slightly 
slower  than  the  distance  criteria.  However,  since  it  results  in 
better  culling  rate  and  the  time  spent  in  this  step  is  relatively 
insignificant  (refer  to  Figure  10),  it  overall  results  in  faster 
frame  rates. 

5.4.  Hierarchical  occlusion  mask 

Our  algorithm  sends  all  the  occluders  to  the  graphics  hard¬ 
ware  and  reads  the  contents  of  the  z-buffer.  The  background 
is  assumed  to  be  at  infinity  with  the  maximum  depth  value. 
We  call  the  non-background  regions  in  the  z-buffer  occluder 
regions ,  which  is  the  union  of  the  projections  of  occluders 
on  the  image  plane.  The  algorithm  shrinks  the  union  by  A 
and  then  computes  the  time  stamp  using  (1).  However,  since 
we  exploit  the  standard  OpenGL  rasterization,  the  z-buffer 
is  not  a  conservative  mask  for  current  viewpoint  due  to  the 
partially  covered  but  drawn  pixels  on  the  silhouette  edges. 
Similar  to  the  technique  proposed  by  Wonka  et  al.22,  our  al¬ 
gorithm  shrinks  one  extra  pixel  to  guarantee  that  only  fully 
covered  pixels  would  be  counted.  The  shrinking  operation 
mentioned  above  is  the  same  as  the  erosion  operation  in  im¬ 
age  processing  community. 

After  the  algorithm  shrinks  the  occluder  regions  in  the  im¬ 


age  acquired  from  the  z-buffer  by  A  + 1  pixels,  the  resulting 
image  serves  as  a  primary  occlusion  mask  Mq.  In  order  to 
accelerate  the  occlusion  test  operation  against  the  mask,  in 
the  implementation,  as  in  Zhang  23 ,  our  algorithm  uses  a  set 
of  hierarchical  masks  Mo,  M\ , . . . ,  Mm-  Starting  from  the  pri¬ 
mary  mask  Mq,  the  hierarchy  is  built  up  by  creating  lower 
resolution  versions  of  Mo-  We  fix  a  parameter  b ,  each  pixel 
in  Mt+i  is  obtained  by  combining  a  block  B  of  bxb  pixels 
of  The  value  for  this  pixel  in  M;+i  is  the  maximum  z- 
value  in  B,  which  guarantees  that  occlusion  tests  involving 
M,+i  are  conservative. 

Zhang  et  al. 5  accelerate  the  construction  of  their  hierar¬ 
chical  occlusion  maps  by  graphics  hardware  that  supports  bi¬ 
linear  interpolation  of  texture  maps.  The  step  is  made  faster, 
but  can  introduce  artifacts.  Our  conservative  mask  genera¬ 
tion  algorithm  is  slower  but  is  performed  at  only  a  subset  of 
all  the  frames. 


5.5.  Marking  algorithm 

We  traverse  the  T  in  a  top-down  manner  to  mark  the  oc¬ 
cluded  nodes  of  T  with  the  computed  time  stamp  .  At  each 
node  x,  we  use  the  hierarchy  of  masks  and  the  3D  bound¬ 
ing  box  Bx  to  speedup  the  occlusion  test.  The  algorithm  tra¬ 
verses  T  as  follows:  For  the  current  node  x,  let  B*  denote  the 
smallest  orthogonal  rectangle  enclosing  the  projection  of  the 
box  Bx  on  the  image  plane.  If  B*  is  occluded  by  the  masks, 
the  algorithm  marks  x.  Otherwise,  it  recursively  visits  the 
children  of  x.  In  order  to  determine  whether  B*  is  occluded, 
depending  on  the  size  of  B * ,  the  algorithm  first  selects  an  ap¬ 
propriate  level  of  mask  Mt*.  It  then  searches  the  hierarchy  of 
masks,  starting  from  M*,  with  B *  in  a  standard  manner.  We 
check  all  cells  b  €  Mi  that  intersect  B*- IfbcBt  is  not  cov¬ 
ered,  we  conclude  that  B *  is  not  occluded.  If  b  intersects  B * 
partially,  we  recursively  check  the  cells  of  M,-_  i  lying  inside 
b  to  determine  whether  B*  H  b  is  occluded  by  the  occluders. 

At  each  node  x  of  Ty  we  also  store  the  number  of  objects 
stored  in  the  subtree  rooted  at  x.  During  the  marking  step, 
instead  of  going  all  the  way  to  a  leaf,  we  stop  visiting  the 
descendants  of  a  partially  visible  node  if  only  few  objects 
—  below  a  chosen  threshold  p  —  are  stored  in  the  subtree 
because  the  time  spent  in  determining  the  visibility  of  these 
nodes  will  offset  the  time  saved  in  not  sending  the  occluded 
objects.  We  refer  to  the  threshold  p,  which  determines  where 
to  terminate  the  visibility  test,  as  leaf  size. 


6.  Dynamic  Update  Algorithm 

We  simulate  the  dynamic  scenes  by  inserting  or  deleting 
some  random  buildings  at  some  randomly  chosen  frames. 
A  “lazy”  approach  is  employed  to  perform  the  update.  We 
omit  the  details  here  and  related  experimental  results  later 
because  of  lack  of  space  from  current  short  abstract  version. 
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Figure  10:  The  time  consumed  by  each  sub-step  of  an  oc¬ 
clusion  marking  operation  for  the  4  models . 


7.  Implementation  and  Performance 

We  tested  the  performance  of  the  above  algorithm  on  an 
SGI  Octane  Mips  R 10000  with  a  196MHZ  CPU  and  128MB 
main  memory.  The  program  provides  a  GUI  for  the  user 
to  select  a  walkthrough  path.  All  of  our  testing  paths  are 
along  the  streets  since  this  is  the  most  realistic  case.  We 
demonstrate  the  performance  of  our  algorithm  on  four  sets 
of  city  models  of  Manhattan  suburb  and  middle  Manhattan. 
The  sizes  of  the  models  are  model  1  with  3,657  polygons, 
model  2  with  27,437  polygons,  model  3  with  109,748  poly¬ 
gons,  and  model  4  with  438,992  polygons.  All  of  them  con¬ 
sist  only  of  buildings,  and  each  building  is  composed  of  a 
few  polygons.  Most  polygons  are  quadrangles,  though  there 
are  also  a  small  number  of  polygons  with  more  than  4  ver¬ 
tices. 

7.1.  Analysis  of  parameters 

We  split  one  marking  operation  into  five  steps:  (i)  selecting 
occluders,  (ii)  drawing  occluders  and  reading  z-buffer  ,  (iii) 
shrinking  to  get  the  primary  mask  Mq,  (iv)  generating  the  hi¬ 
erarchical  occlusion  masks,  and  (v)  marking  the  kd-tree  T. 
Figure  10  shows  the  average  time  taken  by  each  sub-step  for 
four  different  data  sets.  Note  that  time  spent  by  sub-steps  (i) 
-(iv)  does  not  vary  much  as  the  size  of  data  sets  grows,  since 
it  is  mainly  determined  by  the  graphics  hardware  configura¬ 
tions.  The  time  of  the  last  step  dominates  the  overall  time. 
It  increases  as  the  dataset  becomes  larger  because  the  algo¬ 
rithm  has  to  do  occlusion  test  on  more  kd-tree  nodes  against 
the  mask  for  larger  data  sets. 

Several  crucial  parameters  are  involved  in  each  step,  such 
as  the  minimum/maximum  distance  of  the  occluders,  the  size 
and  the  number  of  levels  of  the  hierarchical  occlusion  masks, 
the  size  of  the  leaves  p  in  7\  and  the  shrinkage  A  in  image 
space.  The  final  frame  rates  are  determined  by  the  overhead 
of  the  occlusion  marking  operation  occurred  at  each  critical 
frame,  by  the  frequency  of  critical  frames,  and  by  the  culling 
rate.  All  these  parameters  have  a  compound  effect  on  the 
frame  rate.  We  performed  many  experiments  with  different 


shrinkage  (pixels)  shrinkage  (pixels) 

(a)  (b) 

Figure  11:  Shrinking  size  of  the  mask  can  decrease  the 
culling  rate  as  depicted  in  (a),  but  (b)  shows  that  it  still  en¬ 
hances  the  frame  rate  until  the  shrinkage  is  too  large. 


(a)  (b) 


Figure  12:  (a)  Number  of  error  pixels  for  each  frame  tested 
on  Sun  Ultra  5  .(b)  Number  ofmisdrawn  pixels  for  frames  in 
which  error  occurs ,  tested  on  a  SGI  Octane.  A  =  6  pixels  in 
both  tests. 


configurations  to  inspect  their  influence,  but  omit  the  results 
and  analyses  from  current  short  paper.  Interested  reader  are 
referred  to  the  full  version 1 . 

7.2.  Tradeoff  between  accuracy  and  speed 

Figure  1 1  (a)  depicts  how  A  affects  the  number  of  polygons 
sent  to  the  graphics  hardware  (culling  rate)  for  model  3  con¬ 
sisting  of  around  100,000  polygons.  As  the  value  of  A  in¬ 
creases,  it  has  little  affect  on  the  culling  rate  in  the  beginning, 
but  after  a  while,  it  starts  reducing  the  culling  rate  since  the 
coverage  on  the  occlusion  mask  is  too  sparse.  Eventually,  the 
number  of  polygons  sent  to  the  hardware  converges  to  the 
number  of  polygons  inside  the  view  frustum.  The  curve  in 
(b)  reflects  the  relationship  between  A  and  the  overall  frame 
rates.  As  the  figures  show,  before  A  reaches  the  point  that 
it  starts  to  reduce  the  culling  rate  significantly,  it  enhances 
the  frame  rates.  On  the  other  hand,  larger  A  potentially  allow 
more  error.  Thus  before  A  meets  the  threshold,  the  shrinking 
size  provides  a  way  to  achieve  tradeoff  between  accuracy 
and  speed. 

We  tested  the  size  of  error,  i.e.,  the  number  of  pixels  mis- 
drawn  on  the  resulting  image  compared  with  displaying  the 
scene  using  the  view-frustum  culling  algorithm.  Namely,  we 
read  the  z-buffer  and  frame  buffer  from  the  hardware  after 
applying  our  algorithm  and  the  view-frustum  culling  algo¬ 
rithm  respectively,  and  then  compare  these  buffer.  We  per¬ 
formed  the  same  algorithm  on  two  platforms,  a  SUN  Ultra  5 
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(a)  (b) 


Figure  13:  Light  grey  city  maps  are  overlapped  with  black 
view  frustum  and  black  occluders  in  both  figures.  Dark  grey 
building  in  (a)  are  marked  occluded  by  our  algorithm,  while 
dark  grey  boxes  in  (b)  are  the  bounding  boxes  of  the  marked 
nodes . 


and  a  SGI  Octane,  and  observe  very  different  number  of  mis- 
drawn  pixels.  Figure  12  (a)  shows  the  number  of  error  pixels 
at  each  frame  for  a  path  consisting  of  800  frames.  However, 
very  few  error  pixels  appear  for  the  same  path  when  tested 
on  the  SGI  platform.  So  we  instead  apply  the  algorithm  to 
a  long  path  (11,558  frames)  throughout  the  city  model  on 
the  SGI  machine.  Errors  only  appear  in  158  frames.  See  Fig¬ 
ure  12  (b).  Experiments  for  other  models  show  similar  pat¬ 
terns.  Furthermore,  we  did  not  observe  any  substantial  in¬ 
crease  in  the  visual  error  as  A  becomes  larger.  This  somewhat 
counterintuitive  behavior,  follows  from  the  fact  that  closer 
objects  and  other  occluders  cover  the  leaks  produced.  Also, 
note  that  our  analysis  is  rather  conservative,  and  as  such,  in 
practice,  it  is  rather  “pessimistic”  in  estimating  the  visual  er¬ 
rors. 

7.3.  Performance 

We  demonstrate  the  performance  of  the  algorithm  by  com¬ 
paring  the  culling  and  frame  rates  with  the  view  frustum- 
culling  algorithm.  Our  main  purpose  is  to  prove  the  effec¬ 
tiveness  of  our  algorithm,  so  the  algorithm  has  not  focused 
on  accelerating  the  frame  rates  using  graphics  hardware  such 
as  using  display  lists  or  triangle  strips.  Instead,  we  send  a  set 
of  polygons  in  each  frame. 

In  Figure  13,  the  viewpoint  is  on  a  very  long  street,  with 
the  view  direction  along  that  street.  Figure  13  (a)  shows  that 
the  culling  is  effective  since  most  unmarked  objects  are  those 
buildings  along  the  street  sides.  In  that  case,  other  culling  al¬ 
gorithm  would  only  do  worse  if  they  are  not  careful  about 
the  occluders  set.  The  efficiency  of  the  culling  is  shown  in 
Figure  13  (b)  where  the  dark  boxes  are  the  bounding  boxes 
of  the  nodes  marked  occluded  in  the  kd-tree.  Most  marked 
nodes  are  close  to  the  root,  except  for  those  whose  projec¬ 
tions  are  on  the  boundary  of  the  occluded  region.  In  this 
specific  case,  nodes  close  to  the  streets  and  near  the  view 
frustum  have  to  be  broken  down  to  lower  level  during  the 
marking. 

In  our  algorithm,  a  polygon  is  culled  either  because  it  is 


(a)  (b) 


Figure  14:  Culling  rates  achieved  by  our  algorithm  for  (a) 
model  2  (27,437  polygons)  and  (b)  model  4  (438,992  poly¬ 
gons).  In  both  plots,  the  upper  curve  shows  CRo  while  the 
lower  curve  shows  CRV . 


Data 

size 

Hardware 

only 

Viewfrustum 

culling 

Our 

algorithm 

3,657 

328.57 

58.93 

56.63 

27,437 

9.26 

14.78 

41.27 

109,748 

1.86 

3.26 

38.53 

438,992 

0.43 

1.17 

20.13 

Table  1 :  Frame  rates  using  ( i)  z-bujfer  directly,  ( ii)  viewfrus- 
tum  culling,  and  (iii)  our  algorithm.  The  unit  for  the  frame 
rate  is  frames/sec. 


outside  the  view  frustum,  or  alternatively,  an  ancestor  of  the 
leaf  storing  the  polygon  is  marked  occluded.  Let 

Cr  —  ^polygons  culled  by  the  algorithm 
”  ^polygons  inside  view  frustum  ’ 

r^R  —  ^polygons  culled  by  the  algorithm 
c  0  ~  ^polygons  in  tne  data  set  * 

The  value  of  CRV  shows  the  improvement  in  the  culling 
rate  performance  achieved  by  our  algorithm  over  the  view 
frustum  culling  approach,  while  CRV  refers  to  the  culling 
rate  compared  to  the  original  data  set.  The  graphs  depicted 
in  Figure  14  show  the  changes  of  these  two  culling  rates 
with  respect  to  time.  The  high  culling  rate  is  achieved 
because  the  algorithm  includes  most  of  the  “useful”  objects 
as  occluders. 

Table  1  shows  the  frame  rate  (averaged  over  800  frames) 
obtained  by  our  algorithm.  There  is  no  benefit  in  using  our 
new  algorithm  for  small  data  sets  due  to  the  overhead  at 
critical  frames.  The  big  gains  appear  when  our  algorithm 
is  applied  to  medium  to  large  inputs,  where  our  more  ag¬ 
gressive  culling  pays  off.  In  the  current  implementation,  the 
time  required  at  each  frame  is  not  balanced  since  a  critical 
frame  needs  to  perform  the  expensive  occlusion-marking  op¬ 
eration.  As  the  rendering  and  occlusion-marking  steps  can 
be  performed  independently,  we  can  use  multiple  threads  to 
perform  these  two  steps,  which  would  amortize  the  cost  of 
occlusion  marking  over  several  frames. 
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